NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

The Generative Leap: Sharp Sample Complexity for Efficiently Learning Gaussian Multi-Index Models

Damian, Alex; Lee, Jason; Bruna, Joan (December 2025, Neural Information Processing Systems)

Full Text Available
Settling the Sample Complexity of Online Reinforcement Learning

https://doi.org/10.1145/3733592

Zhang, Zihan; Chen, Yuxin; Lee, Jason; Du, Simon S (May 2025, Journal of the ACM)

A central issue lying at the heart of online reinforcement learning (RL) is data efficiency. While a number of recent works achieved asymptotically minimal regret in online RL, the optimality of these results is only guaranteed in a “large-sample” regime, imposing enormous burn-in cost in order for their algorithms to operate optimally. How to achieve minimax-optimal regret without incurring any burn-in cost has been an open problem in RL theory. We settle this problem for finite-horizon inhomogeneous Markov decision processes. Specifically, we prove that a modified version ofMVP(Monotonic Value Propagation), an optimistic model-based algorithm proposed by Zhang et al. [82], achieves a regret on the order of (modulo log factors)\begin{equation*} \min \big \lbrace \sqrt {SAH^3K}, \,HK \big \rbrace,\end{equation*}whereSis the number of states,Ais the number of actions,His the horizon length, andKis the total number of episodes. This regret matches the minimax lower bound for the entire range of sample sizeK≥ 1, essentially eliminating any burn-in requirement. It also translates to a PAC sample complexity (i.e., the number of episodes needed to yield ε-accuracy) of\(\frac{SAH^3}{\varepsilon ^2} \)up to log factor, which is minimax-optimal for the full ε-range. Further, we extend our theory to unveil the influences of problem-dependent quantities like the optimal value/cost and certain variances. The key technical innovation lies in a novel analysis paradigm (based on a new concept called “profiles”) to decouple complicated statistical dependency across the sample trajectories — a long-standing challenge facing the analysis of online RL in the sample-starved regime.
more » « less
Full Text Available
Transformers Provably Learn Two-Mixture of Linear Classification via Gradient Flow

Yang, Hongru; Wang, Zhangyang; Lee, Jason D; Liang, Yingbin (April 2025, Neural Information Processing Systems (NeurIPS))

Full Text Available
Transformers Provably Learn Two-Mixture of Linear Classification via Gradient Flow

Yang, Hongru; Wang, Zhangyang; Lee, Jason D; Liang, Yingbin (April 2025, International Conference on Learning Representations (ICLR))

Full Text Available
Transformers provably learn two-mixture of linear classification via gradient flow

Yang, Hongru; Wang, Zhangyang; Lee, Jason D; Liang, Yingbin (April 2025, International Conference on Learning Representations (ICLR))

Full Text Available
Scaling Laws in Linear Regression: Compute, Parameters, and Data

Lin, Licong; Wu, Jingfeng; Kakade, Sham M; Bartlett, Peter L; Lee, Jason D (December 2024, Advances in neural information processing systems)

Full Text Available
REBEL: Reinforcement Learning via Regressing Relative Rewards

Gao, Zhaolin; Chang, Jonathan; Zhan, Wenhao; Oertell, Owen; Swamy, Gokul; Brantley, Kianté; Joachims, Thorsten; Bagnell, J Andrew; Lee, Jason; Sun, Wen (December 2024, 38th Conference on Neural Information Processing Systems (NeurIPS 2024))

Full Text Available
Settling the Sample Complexity of Online Reinforcement Learning

Zhang, Zihan; Chen, Yuxin; Lee, Jason; Du, Simon (July 2024, Conference on Learning Theory)
Computational-Statistical Gaps in Gaussian Single-Index Models

Damian, Alex; Pillaud-Vivien, Loucas; Lee, Jason; Bruna, Joan (June 2024, Conference on Learning Theory (COLT))

Single-Index Models are high-dimensional regression problems with planted structure, whereby labels depend on an unknown one-dimensional projection of the input via a generic, non-linear, and potentially non-deterministic transformation. As such, they encompass a broad class of statistical inference tasks, and provide a rich template to study statistical and computational trade-offs in the high-dimensional regime. While the information-theoretic sample complexity to recover the hidden direction is lin- ear in the dimension d, we show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require Ω(dk⋆/2) samples, where k⋆ is a “generative” exponent associated with the model that we explicitly characterize. Moreover, we show that this sample complexity is also sufficient, by establishing matching upper bounds using a partial-trace algorithm. Therefore, our results pro- vide evidence of a sharp computational-to-statistical gap (under both the SQ and LDP class) whenever k⋆ > 2. To complete the study, we construct smooth and Lipschitz deterministic target functions with arbitrarily large generative exponents k⋆.
more » « less
Full Text Available
Optimal Multi-Distribution Learning

Zhang, Zihan; Zhan, Wenhao; Chen, Yuxin; Du, Simon; Lee, Jason (July 2024, Conference on Learning Theory)

« Prev Next »

Search for: All records